A pilot investigation of information extraction in the semantic annotation of archaeological reports
نویسندگان
چکیده
The paper discusses a prototype investigation of semantic annotation, a form of metadata assigning conceptual entities to textual instances, in this case archaeological grey literature. The use of Information Extraction (IE), a Natural Language Processing (NLP) technique, is central to the annotation process while the use of Knowledge Organization System (KOS) is explored for the association of semantic annotation with both ontological and terminological references. The annotation process follows a rule-based information extraction approach using the GATE NLP toolkit, together with the CIDOC CRM ontology, its CRM-EH archaeological extension and English Heritage thesauri and glossaries. Results are reported from an initial evaluation, which suggest that these information extraction techniques can be applied to archaeological grey literature reports. Further work is discussed drawing on the evaluation and consideration of the characteristics of the archaeology domain.
منابع مشابه
Semantic Annotation for Indexing Archaeological Context: A Prototype Development and Evaluation
The paper discusses the process of developing Semantic Annotations, a form of metadata for assigning conceptual entities to textual instances, in this case archaeological grey literature. The use of Information Extraction (IE), a Natural Language Processing (NLP) technique is central to the annotation process. The paper explores the use of Ontology Oriented Information Extraction (OOIE) methods...
متن کاملPresenting a method for extracting structured domain-dependent information from Farsi Web pages
Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...
متن کاملSemantic-Based Image Retrial in the VQ Compressed Domain using Image Annotation Statistical Models
متن کامل
A Pilot Study on the Semantic Classification of Two German Prepositions: Combining Monolingual and Multilingual Evidence
This paper reports on the annotation and maximum-entropy modeling of the semantics of two German prepositions, mit (‘with’) and auf (‘on’). 500 occurrences of each preposition were sampled from a treebank and annotated with syntactosemantic classes by two annotators. The classification is guided by a perspective of information extraction, relies on linguistic tests and aims at the separation of...
متن کاملSome Remarks on Automatic Semantic Annotation of a Medical Corpus
In this paper we present arguments that elaborating a rule based information extraction system is a good starting point for obtaining a semantic annotated corpus of medical data. Our claim is supported by evaluation results of the automatic annotation of a corpus containing hospital discharge reports of diabetic patients.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IJMSO
دوره 7 شماره
صفحات -
تاریخ انتشار 2012